# An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

This is the code implementation for the paper [An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning].  This implementation is mainly based the codebase of [PDOA](https://github.com/qianlin04/PDOA).

## Prerequisites

- **Operating System**: tested on Ubuntu 18.04.
- **Python Version**: >= 3.8.11.
- **PyTorch Version**: >= 1.8.1.
- **MuJoCo** : install mujoco and mujoco-py of version 2.1 by following the instructions in [mujoco-py](<https://github.com/openai/mujoco-py>).
- **Wandb**
- **DSRL benchmark**:  [DSRL: Datasets and env wrappers for offline safe reinforcement learning ](https://github.com/liuzuxin/DSRL)

## Installation

```
conda env create -f environment.yml
conda activate PDOA
pip install -e .
```

Install D4MORL environments by

```
cd ./lib/utilities/morl/MOEnvs
pip install -e .
```



## Data Download and Generation

- For the DSRL dataset, one only need to successfully install DSRL, and the download will start automatically when the program is executed.

- For the D4MORL dataset, one can download it by using commands:

  ```
  pip install gdown
  cd ./PDOA
  gdown --folder https://drive.google.com/drive/folders/1wfd6BwAu-hNLC9uvsI1WPEOmPpLQVT9k?usp=sharing --output data
  ```

  or accessing the [website](https://drive.google.com/drive/folders/1wfd6BwAu-hNLC9uvsI1WPEOmPpLQVT9k?usp=sharing) via the browser.

- For the CMO datasets, one should additionally run the another code of CMO_generate in our supplemental material.

**Note:** D4MORL and CMO Datasets should be placed in `./PDOA/data/d4morl` and `./PDOA/data/cmo`, respectively.



## Data Preprocesssing

After obtaining the datasets, run `generate_test_dataset.ipynb` to generate training set for training and demonstration set for evaluation.



## Training

- Run the PDOA for all D4MORL datasets:

  ```
  cd ./PDOA
  bash run_bash/train_d4morl_env.sh
  ```

- Run the PDOA for all DSRL datasets:

  ```
  cd ./PDOA
  bash run_bash/train_safe_env.sh
  ```

- Run the PDOA for all CMO datasets:

  ```
  cd ./PDOA
  bash run_bash/train_cmo_env.sh
  ```

- Single experiment for a specific dataset

  ```
  python train_eval.py --env MO-Ant-v2 --dataset_type amateur_uniform --dataset 'd4morl' --algo 'Diffusion-QL'
  ```

We also provide testing scripts for individual evaluation after training, including `eval_d4morl_env.sh`, `eval_safe_env.sh`, and `eval_cmo_env.sh`. Note that the seed within these scripts, which serves as the run identifier, should be changed to the seed used during training.
